分布强化学习〜(RL)是一类最先进的算法,可估计总回报的整个分布,而不仅仅是其期望。分布RL的经验成功取决于回报分布的表示和分布差异的选择。在本文中,我们提出了一类新类\ textit {sindhorn Distributional rl〜(sindhorndrl)}算法,该算法学习了一组有限的统计数据,即确定性样本,从每个返回分布中,然后使用sinkhorn迭代来评估sindhorn迭代之间的距离当前和目标铃铛分布。 sindhorn的差异特征是瓦斯汀距离与最大平均差异〜(MMD)之间的插值。 Sindhorndrl通过利用基于最佳传输距离的几何形状和MMD的无偏梯度估计特性,从而找到了一个甜蜜点。最后,与最先进的算法相比,Sinkhorndrl的竞争性能在55场Atari游戏中得到了证明。
translated by 谷歌翻译
我们提出了一种有效的可解释的神经象征模型来解决感应逻辑编程(ILP)问题。在该模型中,该模型是由在分层结构中组织的一组元规则构建的,通过学习嵌入来匹配元规则的事实和身体谓词来发明一阶规则。为了实例化它,我们专门设计了一种表现型通用元规则集,并证明了它们产生的喇叭条件的片段。在培训期间,我们注入了控制的\ PW {gumbel}噪声以避免本地最佳,并采用可解释性 - 正则化术语来进一步指导融合到可解释规则。我们在针对几种最先进的方法上证明我们对各种任务(ILP,视觉基因组,强化学习)的模型进行了验证。
translated by 谷歌翻译
虽然深增强学习已成为连续决策问题的有希望的机器学习方法,但对于自动驾驶或医疗应用等高利害域来说仍然不够成熟。在这种情况下,学习的政策需要例如可解释,因此可以在任何部署之前检查它(例如,出于安全性和验证原因)。本调查概述了各种方法,以实现加固学习(RL)的更高可解释性。为此,我们将解释性(作为模型的财产区分开来和解释性(作为HOC操作后的讲话,通过代理的干预),并在RL的背景下讨论它们,并强调前概念。特别是,我们认为可译文的RL可能会拥抱不同的刻面:可解释的投入,可解释(转型/奖励)模型和可解释的决策。根据该计划,我们总结和分析了与可解释的RL相关的最近工作,重点是过去10年来发表的论文。我们还简要讨论了一些相关的研究领域并指向一些潜在的有前途的研究方向。
translated by 谷歌翻译
目前有技术节点缩放,早期设计阶段的精确预测模型可以显着降低设计周期。特别是在逻辑合成期间,预测由于逻辑组合不当导致的细胞拥塞可以减少后续物理实现的负担。已经尝试使用图形神经网络(GNN)技术来解决逻辑合成阶段的拥塞预测。然而,它们需要信息性小区特征来实现合理的性能,因为GNN的核心概念构建在消息通过框架上,这在早期逻辑合成阶段将是不切实际的。为了解决这个限制,我们提出了一个框架,可以直接学习给定网表的嵌入式,以提高节点功能的质量。基于流行的随机播放的嵌入方法,如Node2VEC,LINE和DeadWalk遭受横绘对齐和普遍性的问题,以取消差价,效率低于性能和成本耗费的运行时。在我们的框架中,我们介绍了一种卓越的替代方案,可以获得可以使用矩阵分解方法概括在网表图中的节点嵌入。我们在子图水平上提出了一种高效的迷你批量培训方法,可以保证并行培训并满足大规模网手册的内存限制。我们呈现利用开源EDA工具的结果,如Dreamplace和OpenORAD框架上的各种公开的电路。通过将学习的嵌入在网手册的顶部与GNN结合,我们的方法可以提高预测性能,推广到新电路线,并且在训练中具有高效,潜在节省超过$ 90 \%运行时。
translated by 谷歌翻译
通过移除昂贵的乘法操作并将连续权重量化成低比特离散值来减少计算复杂性,与传统的神经网络相比,这是快速且节能的低比特离散值。然而,现有的换档网络对重量初始化敏感,并且还产生由消失梯度和重量率冻结问题引起的降级性能。为了解决这些问题,我们提出了一种低点重新参数化,这是一种用于训练低位换档网络的新技术。我们的方法以符号稀疏偏移3倍的方式分解离散参数。以这种方式,它有效地学习了一个低比特网络,其权重动力学类似于全精密网络并对重量初始化不敏感。我们所提出的培训方法推动移位神经网络的界限,并以在想象中的前1个精度方面显示出3位换档网络。
translated by 谷歌翻译
推理,学习和决策的整合是构建更多普通AI系统的关键。作为朝这个方向的一步,我们提出了一种新颖的神经逻辑架构,可以解决电感逻辑编程(ILP)和深增强学习(RL)问题。我们的体系结构通过分配权重来谓词而不是规则来定义一阶逻辑程序的受限但呈现的连续空间。因此,它是完全可分的,可以用梯度下降有效地培训。此外,在与演员批评算法的深度RL设置中,我们提出了一种新颖的高效评论家建筑。与ILP和RL问题的最先进方法相比,我们的命题实现了出色的性能,同时能够提供完全可解释的解决方案和更好地缩放,特别是在测试阶段。
translated by 谷歌翻译
我们研究了强化学习(RL)中的策略扩展值函数近似器(PEVFA),其扩展了传统的价值函数近似器(VFA),不仅将输入的输入(和动作)而且是一个显式策略表示。这样的扩展使PEVFA能够同时保留多个策略的值,并带来吸引人的特性,即\ \ emph {策略之间的值泛化}。我们正式分析了广义政策迭代(GPI)下的价值概括。从理论和经验镜头来看,PEVFA提供的广义值估计值可能对连续策略的真实值较低的初始近似误差,这预计将在GPI期间提高连续值近似。基于上述线索,我们介绍了一种新的GPI形式,PEVFA,利用了政策改进路径的价值泛化。此外,我们向RL策略提出了一个表示学习框架,提供了从策略网络参数或状态操作对中学习有效策略嵌入的几种方法。在我们的实验中,我们评估了PEVFA和政策代表学习在几个Openai健身房连续控制任务中提供的价值概括的效果。对于算法实现的代表性实例,在GPI的GPI范式下重新实现的近端策略优化(PPO)在大多数环境中对其VANILLA对应物的绩效改进约为40 \%。
translated by 谷歌翻译
In this paper, we propose a robust 3D detector, named Cross Modal Transformer (CMT), for end-to-end 3D multi-modal detection. Without explicit view transformation, CMT takes the image and point clouds tokens as inputs and directly outputs accurate 3D bounding boxes. The spatial alignment of multi-modal tokens is performed implicitly, by encoding the 3D points into multi-modal features. The core design of CMT is quite simple while its performance is impressive. CMT obtains 73.0% NDS on nuScenes benchmark. Moreover, CMT has a strong robustness even if the LiDAR is missing. Code will be released at https://github.com/junjie18/CMT.
translated by 谷歌翻译
Dataset distillation has emerged as a prominent technique to improve data efficiency when training machine learning models. It encapsulates the knowledge from a large dataset into a smaller synthetic dataset. A model trained on this smaller distilled dataset can attain comparable performance to a model trained on the original training dataset. However, the existing dataset distillation techniques mainly aim at achieving the best trade-off between resource usage efficiency and model utility. The security risks stemming from them have not been explored. This study performs the first backdoor attack against the models trained on the data distilled by dataset distillation models in the image domain. Concretely, we inject triggers into the synthetic data during the distillation procedure rather than during the model training stage, where all previous attacks are performed. We propose two types of backdoor attacks, namely NAIVEATTACK and DOORPING. NAIVEATTACK simply adds triggers to the raw data at the initial distillation phase, while DOORPING iteratively updates the triggers during the entire distillation procedure. We conduct extensive evaluations on multiple datasets, architectures, and dataset distillation techniques. Empirical evaluation shows that NAIVEATTACK achieves decent attack success rate (ASR) scores in some cases, while DOORPING reaches higher ASR scores (close to 1.0) in all cases. Furthermore, we conduct a comprehensive ablation study to analyze the factors that may affect the attack performance. Finally, we evaluate multiple defense mechanisms against our backdoor attacks and show that our attacks can practically circumvent these defense mechanisms.
translated by 谷歌翻译
Few Shot Instance Segmentation (FSIS) requires models to detect and segment novel classes with limited several support examples. In this work, we explore a simple yet unified solution for FSIS as well as its incremental variants, and introduce a new framework named Reference Twice (RefT) to fully explore the relationship between support/query features based on a Transformer-like framework. Our key insights are two folds: Firstly, with the aid of support masks, we can generate dynamic class centers more appropriately to re-weight query features. Secondly, we find that support object queries have already encoded key factors after base training. In this way, the query features can be enhanced twice from two aspects, i.e., feature-level and instance-level. In particular, we firstly design a mask-based dynamic weighting module to enhance support features and then propose to link object queries for better calibration via cross-attention. After the above steps, the novel classes can be improved significantly over our strong baseline. Additionally, our new framework can be easily extended to incremental FSIS with minor modification. When benchmarking results on the COCO dataset for FSIS, gFSIS, and iFSIS settings, our method achieves a competitive performance compared to existing approaches across different shots, e.g., we boost nAP by noticeable +8.2/+9.4 over the current state-of-the-art FSIS method for 10/30-shot. We further demonstrate the superiority of our approach on Few Shot Object Detection. Code and model will be available.
translated by 谷歌翻译